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(57) ABSTRACT 

The present invention is directed to rearranging the trans- 
mission order of the enhancement-layer frames. By making 
the display and transmission order of the enhancement layer 
frames identical, a frame memory is not required on the 
decoder-side to hold the enhancement-layer frame until 
being displayed since the display can take place immediately 
after the decoding. Reducing the amount of memory is 
desirable for mobile applications or other low-power con- 
sumption devices. 

7 Claims, 8 Drawing Sheets 



| ENHANCEMENT LAYER 




DISPLAY ORDER 




[ ENHANCEMENT LAYER 
BASE LAYER 




B 2 



P5 



TRANSMISSION ORDER 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent Feb. 24, 2004 Sheet 1 of 8 US 6,697,426 Bl 



ENHANCEMENT LAYER 




BASE LAYER 



DISPLAY ORDER 





! ENHANCEMENT LAYER 

\<s><$> 


<$> 




w<$> 

\ BASE LAYER 


<$> 


j 


TRANSMISSION ORDER 



FIG.1 

PRIOR ART 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent Feb. 24, 2004 Sheet 2 of 8 US 6,697,426 Bl 



ENHANCEMENT 
LAYER 
BITSTREAM 



BIT-PLANE 
MEMORY 

T 



15 



Hi 



BIT-PLANE 
VLD 



-16 



BIT-PLANE 
DECODING 



18 

2 



r 



22 



FRAME 
MEMORY 



IDCT 



^17 



,10 




BASE LAYER 
BITSTREAM 



INVERSE 


— ► 


IDCT 


— ► 


+ 


QUANT. 











FIG. 2 

PRIOR ART 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent Feb. 24, 2004 Sheet 3 of 8 US 6,697,426 Bl 



ENHANCEMENT LAYER 








K 




n 


? 


<$>< 


<$>\ 


! < 










<y\ 


i BASE LAYER 











DISPLAY ORDER 



ENHANCEMENT LAYER 








\<5>< 


<S> 


<S> 


<S> 




\<S>< 




<$> 


<S> 




\ BASE LAYER 











TRANSMISSION ORDER 



FIG. 3 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent 



Feb. 24, 2004 



Sheet 4 of 8 



US 6,697,426 Bl 



ENHANCEMENT 

LAYER 
BITSTREAM 
WITH 
TRANSMISSION 
ORDER 



26 



15 



BIT-PLANE 
MEMORY 



BIT-PLANE 
VLD 



.16 



BIT-PLANE 
DECODING 



18 



24 



ENHANCED 
VIDEO 



IDCT 



17 



VLD 

! 



• ► 



BASE LAYER 
BITSTREAM 



INVERSE 
QUANT. 



I 



8 



IDCT 
~D — 



10 



T? 

20 



12 



MC 



FRAME 
MEMORY 



7 

14 



FIG. 4 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent Feb. 24, 2004 Sheet 5 of 8 US 6,697,426 Bl 



ENHANCEMENT r 

LAYER E 1 E 2 E 3 E 4 E 5—- 

BASE ^ P 3 B 2 P 5 B 4 P 6 .... ( 

UYER i h- + 1 1 1- 

Ti T 2 T 3 T 4 T 5 T 6 



FIG. 5 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent Feb. 24, 2004 Sheet 6 of 8 US 6,697,426 Bl 



54 



28 



56 



FGS 
ENHANCEMENT 
LAYER (EL) 
ENCODER 



FRAME MEMORY 
(USED FOR BITPLANE BY 
BITPLANE ENCODING 



FGS ENCODER 



58 



62 



60 



EL 
STREAM 



FRAME MEMORY 
(USED FOR REORDERING THE EL 
FRAMES IN DISPLAY ORDER) 



38 



ORIGINAL 



32 34 



36 



QUANTIZATION 
Q 



INVERSE 
-1 




ENTROPY 
ENCODER 



INVERSE 
QUANTIZATION 
Q 



MOTION 
COMPENSATION 

I 



7 
42 



MUX 



BL 



STREAM 



40 



BASE LAYER (BL) 
ENCODER 



FRAME 
MEMORY 



T 



*50 



MVs 



MOTION 
ESTIMATION 



7 

30 



52 



FIG. 6 



07/07/2004, EAST Version: 1.4.1 



4 U.S. Patent Feb. 24, 2004 Sheet 7 of 8 US 6,697,426 Bl 



ENHANCEMENT LAYER 




BASE LAYER 



DISPLAY ORDER 



i ENHANCEMENT LAYER 








\<$><$> 


<$> 


<$> 


<s>\ 


\<s><s> 








j BASE LAYER 









TRANSMISSION ORDER 



FIG. 7 



07/07/2004, EAST Version: 1.4.1 



U.S. Patent 



Feb. 24, 2004 



Sheet 8 of 8 



US 6,697,426 Bl 




INPUT 
VIDEO/ 
IMAGE 



OUTPUT 
VIDEO 
IMAGE 



FIG. 8 



07/07/2004, EAST Version: 1.4.1 



US 6,6! 

1 

REDUCTION OF LAYER-DECODING 
COMPLEXITY BY REORDERING THE 
TRANSMISSION OF ENHANCEMENT 
LAYER FRAMES 

CROSS REFERENCE TO RELATED 
APPLICATIONS 

The present application claims the benefit of U.S. Provi- 
sional Application Serial No. 60/190,368, filed on Mar. 17, 
2000. 

BACKGROUND OF THE INVENTION 

The present invention generally relates to video coding, 
and more particularly to rearranging the transmission order 
of enhancement layer frames. 

In MPEG-4 base-layer decoders as well as MPEG-2 
decoders for that matter, the transmission order of the 
various frames differs from the display order. An example of 
this is shown in FIG. 1. As can be seen, the transmission 
order of both the base layer frames and corresponding 
enhancement layer frames differs from the display order. 

The reason for the rearrangement of the frames of FIG. 1 
is that the bi-directional motion compensation (MC) 
employed for the B- frames requires the anchor frames (I and 
P-frames) on which the prediction is made to be already 
available in the memory at the encoder/decoder side, when 
the B-frames are encoded/decoded. This requires that the I- 
and P-frames to be transmitted to the decoder prior to the 
B-frames. However, since the B-frames is typically dis- 
played between the I- and P-frames, the transmission and 
display order of the frames are different due to the 
MC-prediction. 

A block diagram of one example of a scalable (layered) 
decoder is shown in FIG. 2. During operation, the decoder 
2 receives the encoded base and enhancement layer frames 
in the transmission order shown in FIG. 1. Further, the 
decoder 2 will decode and reorder these frames into the 
display order shown in FIG. 2, 

As can be seen, the decoder 2 includes two separate paths 
for decoding the base layer and enhancement layer bit 
steams. Since these two paths are separate the decoding 
process of each of the two streams does not need to be 
synchronized. 

The path for the base layer stream includes a variable 
length decoder 4, an inverse quantization block 6 and an 
inverse discrete cosine transform block (IDCT) 8 to convert 
the base layer bit-steam into picture frames. A motion 
compensation block 12 is also included for performing 
motion compensation on picture frames previously stored in 
a frame memory 14 based on the received motion vectors. 
Further, an adder 10 is also included to combine the outputs 
of the IDCT block 8 and the motion compensation block 12. 

The path for the enhancement layer stream includes a 
variable length decoder (VLD 15, a bit plane decoding block 
17 and another IDCT block 18 to convert the enhancement 
layer bit-steam into picture frames. During operation, the 
bit-plane decoding block 17 will decode the output of the 
variable length decoder 12 into individual bit planes using 
any suitable fine granular scalable decoding technique. 

As can be further seen, a bit plane memory 16 is also 
included to store the individual bit planes until all of the bit 
planes for a current frame are decoded. Further, after the 
IDCT block 18 a frame memory 22 is included. The frame 
memory 22 is used to compensate for the encoded frames 
being received in a transmission order different from the 
display order, as shown in FIG. 1. 
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For example, if the enhancement layer frames are trans- 
mitted at the same time instance as the corresponding 
base-layer frames, the frame-memory 22 is required to store 
the enhancement-layer frames until its display time, which 

5 coincides with the base -layer display time. Referring back to 
the transmission order of FIG. 1, the enhancement picture E 3 
after being decoded is stored in the frame memory 22 until 
after the enhancement frame E2 is decoded and displayed. 
Thereafter, the enhancement frame E 3 is retrieved from the 

10 frame memory and than displayed. Therefore, in this 
manner, the transmission order of the frames is converted 
into the display order, as shown in FIG. 1. 

The decoder 2 also includes another adder 20 to combine 
the picture frames from each of the paths in order to produce 

15 enhanced video 24. The enhanced video 24 can be either 
displayed immediately in real time or stored in an output 
frame memory for display at a later time. 

SUMMARY OF THE INVENTION 

20 

The present invention is directed to a method for encoding 
video data. The method includes coding a portion of the 
video data to produce base layer frames. Also, coding 
another portion of the video data to produce enhancement 

25 layer frames. Further, rearranging the enhancement layer 
frames into a display order. 

The present invention is also directed to a method for 
decoding a video signal including a base layer and an 
enhancement layer, where the enhancement layer includes 

30 enhancement frames arranged in a display order. The 
method includes decoding the base layer to produce decoded 
base layer frames. Also, decoding the enhancement layer to 
produce decoded enhancement layer frames and rearranging 
the decoded base layer frames into the display order. Further, 

35 combining the decoded base layer frames with the decoded 
enhancement layer frames without storing any of the 
decoded enhancement layer frames to form video frames. 

BRIEF DESCRIPTION OF THE DRAWINGS 

40 Referring now to the drawings were like reference num- 
bers represent corresponding parts throughout: 

FIG. 1 is a diagram showing the transmission and display 
order for a conventional encoding system; 

45 FIG. 2 is a block diagram showing one example of a 
decoder; 

FIG. 3 is a diagram showing one example of the trans- 
mission and display order according to the present inven- 
tion; 

50 FIG. 4 is a block diagram showing one example of a 
decoder according to the present invention; 

FIG. 5 is a diagram showing one example of the trans- 
mission timing of the frames according to the present 
invention; 

55 FIG. 6 is a block diagram showing one example of a 
encoder according to the present invention; 

FIG. 7 is a diagram showing another example of the 
transmission and display order according to the present 

„ invention; and 

□0 

FIG. 8 is a block diagram showing one example of a 
system according to the present invention. 

DETAILED DESCRIPTION 

65 The present invention is directed to rearranging the trans- 
mission order of coded enhancement-layer frames. By mak- 
ing the display and transmission order of the enhancement 
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layer frames identical, a frame memory is no longer neces- 
sary at the decoder-side to hold the enhancement-layer 
frames until being displayed since the display can take place 
immediately after the decoding. Reducing the amount of 
memory is desirable for mobile applications or other low- 
power consumption devices. 

In the conventional encoding system, where the enhance- 
ment layer transmission order is the same as for the base- 
layer, more than two frames stores are necessary for decod- 
ing. Referring to FIG. 1, one frame memory is used to store 
the E a frame, one frame memory is used to store the E 3 
frame (which has been decoded, but cannot be displayed 
until E 2 is received, decoded and displayed) and one frame 
memory is used for the decoding and storing of E 2 . 
However, according to the present invention, the memory to 
store the compressed E3 data is no longer necessary. 

One example of the transmission and display order 
according to the present invention is shown in FIG. 3. For 
purposes of explanation, FIG. 3 only shows five base layer 
frames and corresponding enhancement layer frames. 
However, it should be noted that in an actual system the 
present invention would be applied to a variety of different 
groups of picture (GOP) structures. 

As can be seen from FIG. 3, the transmission order of the 
base layer frames is same as in the conventional system 
shown in FIG. 1. However, according to the present 
invention, the transmission order of the enhancement frames 
has been rearranged to be the same as the display order of 
the enhancement frames on the decoder side, as shown in 
FIG. 3. 

By rearranging the transmission order of the enhancement 
frames to be the same as the display order no local memory 
is necessary for the enhancement frames since the FGS 
frames are displayed immediately after the decoding. Of 
course, the display takes place after the FGS residual has 
been added to the base-layer frame. 

One example of a decoder according to the present 
invention is shown in FIG. 4. As can be seen, the decoder 26 
of this figure is the same as the conventional decoder of FIG. 
2 except that a frame memory 22 at the output of the IDCT 
block 18 is no longer required. As described above, this 
frame memory is no longer required since the transmission 
order of the enhancement frames has been rearranged to be 
the same as the actual display order of the frames. Therefore, 
the enhancement layer frames can be displayed in the 
ordered received after being combined with the base layer 
frames. 

During operation, the decoder 26 will receive the base and 
enhancement layer frames in the transmission order shown 
in FIG. 3. However, in FIG. 3, the transmission order of the 
base layer frames is different than the enhancement layer 
frames. In order to compensate for this, the order of the base 
layer frames is changed and the timing of the enhancement 
layer frames is changed, as described below. 

One example of the transmission timing of the enhance- 
ment layer frames according to the present invention is 
shown in FIG. 5. As can be seen, the transmission timing of 
the enhancement layer frames is delayed with respect to the 
corresponding base layer frames. In the first time period, the 
base layer frame I, is transmitted. Since the transmission of 
the corresponding enhancement layer frame Ej has been 
delayed to the next period, the decoder 26 of FIG. 4 will 
decode the base layer frame l 1 and just store it in the frame 
memory 14 until the base layer frame P 3 and the enhance- 
ment frame Ej is received. 

In the second lime period of FIG. 5, the enhancement 
layer frame £ l and the base layer frame P 3 is transmitted. At 



this time, the decoder 26 of FIG. 4 will decode the base layer 
frame P 3 and again just store it in the frame memory 14 until 
the delayed enhancement frame E 3 is received and decoded. 
Further, the decoder 26 of FIG. 4 will decode the enhance- 
ment layer frame E a and combine it with the corresponding 
base layer frame lj previously stored in the frame memory 
14 to form a frame of enhanced video. 

In the third time period of FIG. 5, the base layer frame B 2 
and the corresponding enhancement layer frame E 2 is trans- 
mitted at the same time. Thus, the decoder 26 of FT G. 4 will 
decode the base layer frame B 2 and the corresponding 
enhancement layer frame E 2 at the same time and then 
combine the decoded frames to form another frame of 
enhanced video. 

In the fourth time period of FIG. 5, the enhancement layer 
frame E3 and the base layer frame P 5 is transmitted. At this 
time, the decoder 26 of FIG. 4 will decode the base layer 
frame P 5 and again just store it in the frame memory 14 until 
the delayed enhancement frame E 5 is received and decoded. 
Further, the decoder 26 of FIG. 4 will decode the enhance- 
ment layer frame E 3 and combine it with the corresponding 
base layer frame P 3 previously stored in the frame memory 
14 to form another frame of enhanced video. As can be seen 
from FIG. 5, the above-described process will continue until 
all of the enhancement and corresponding base layer frames 
transmitted in the subsequent time periods are decoded and 
combined to produce an enhanced video sequence. 

One example of an encoder according to the present 
invention is shown in FIG. 6. According to the present 
invention, the encoder will produce a stream of base layer 
frames and a stream of enhancement layer frames according 
to the transmission order shown in FIG. 3. 

As can be seen from FIG. 7, the encoder 28 includes a 
base layer encoder 30 and enhancement layer encoder 54. 
The base layer encoder 30 includes a discrete cosine trans- 
form (DCT) block 34, a quantization block 36 and an 
entropy encoder 38 to encode the original video into I frames 
and the motion compensated residuals into P and B frames. 

The layer base encoder 30 also includes an inverse 
quantization block 42, an IDCT block 44, an adder 46 and 
a compensation block 48 connected to the other input of the 
adder 46. During operation, these elements 42,44,46,48 
provide a decoded version of the current frame being coded, 
which is stored in a frame memory 50. 

A motion estimation block 52 is also included which 
produces the motion vectors from the current frame and a 
decoded version of the previous frame stored in the frame 
memory 50. The use of the decoded version of the previous 
frame enables the motion compensation performed on the 
decoder side to be more accurate since it is the same as 
received on the decoder side. 

As can be further seen, the output of the motion compen- 
sation block 48 is also connected to one side of the subtracter 
55 32. This enables motion compensated residuals based on 
predictions from previously transmitted coded frames to be 
subtracted from the current frame being coded. A multi- 
plexer 40 is also included to combine the outputs of the 
entropy encoder 38 and the motion estimation block 52 to 
60 form the base layer stream. 

The enhancement layer encoder 54 includes another sub- 
tracter 62. The subtracter 62 is utilized to subtract the output 
of the inverse quantization block 42 from the output of the 
DCT block 34 in order to form residual images. A fine 
65 granular scalable (FGS) encoder 58 is also included to 
encode the residual images produced by the subtracter 62. 
The residual images are encoded by performing bit-plane 
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DCT scanning and entropy encoding. A frame memory 5 6 is readable code executed by the system. The code may be 

connected to the FGS encoder 58, which is utilized to store stored in the memory 68 or read/downloaded from a 

each of the bit-planes after being decoded. After all of the memory medium such as a CD-ROM or floppy disk. In other 

bit-planes of the current frame are decoded, the frame embodiments, hardware circuitry may be used in place of, or 

memory 56 will output that frame. s ^ combination with, software instructions to implement the 

A . e t i c a m „, * n lV invention. For example, the elements shown in FIGS. 4 and 

As can be further seen, another frame memory 60 is _ . ,. , \ i A u 

,,f*t,c/-e a an k a- 7 also can be implemented as discrete hardware elements, 

connected to the output of the FGS encoder 60. Accord ng ^ ^ ^ described above jn 

to the present jnvenuon, the frame memory 60 rearranges the ^ Qf ^ j u fa t0 be understood , bat , he 

enhancement layer frames mto the transmission order shown invention f s not fo^ed t0 be confined or limited to the 

in FIG. 3. In order to perform the rearrangement of the 10 examples disclosed herei0i lt should be noted that the 

enhancement layer frames, the encoded enhancement layer application of the framework described herein goes beyond 

frames are stored in the frame memory 60 and then trans- (ne examp i es shown in the figures. The present invention is 

milled according to the timing shown in FIG. 5. applicable to all schemes employing motion compensation 

As previously described, the transmission order of the (MC)at the base-layer and having an enhancement-layer 
enhancement layer frames is the same order as the frames 15 without MC (i.e. Intra coded). Therefore, this mechanism 
are displayed on the decoder side. This is significant since it can be applied to all scalable schemes where no 
eliminates the need for one of the frame memories on the Bi-directional prediction is done within the enhancement- 
decoder-side, which is desirable for mobile and other low layer (i.e., with no intra-enhancement-layer prediction) or 
power applications. sin gle direction prediction MC. 

According to the present invention, in addition to the 20 Further, the P r6S6nt invention is adaptable to any coding 

applicability of the present invention to enhancement-layers algorithm used for the enhancement-layer residual- 

with no inter-enhancement prediction, the present invention progressive coding or normal quantization, wavelet or DCT 

is also applicable to the case where single direction predic- etc. Examples of such enhancement-layer coding schemes 

tion (i.e. no bi-direclional MC prediction) is used with the are the MPEG-4 Fme-Granular-Scalabihty (FGS) method 

enhancement layer. An example of this scenario is shown in 25 and the SNR scalability of MPEG-2, where no prediction in 

Pjq g the enhancement layer is used. 

„ . . . , 1 . i -j . . What is claimed is: 

The present invention is also applicable in he case . where ie ^ . Qc a base 

multiple enhancement layers are used on the top of the base an enhancement S layer> wher 8 ein the enhancement 

layer. In his ca*. >,each of the enhancement layers can . either 3Q ? enhancement frames arranged in a display 

have no intra-enhancement-layer prediction or has a single- * . . f & 

direction prediction (from that enhancement layer or any ^ the f . . . . . . 

other layer with the overall layered-coding structure). decoding the base layer to produce decoded base layer 

One example of a system in which die present invention enhancement layer to produce decoded 

may be implemented is shown in FIG 9. By way of 3S eDhar f cem ent layer frames; 

examples, the system may represent a television, a set-top combini the decoded base layer frames with the 

box, a desktop laptop or palmtop computer a personal dec oded enhancement layer frames without storing any 

digital assistant (PDA), a video/image storage device such of ^ decoded enhancement layer frames to form video 

as a video cassette recorder (VCR), a digital video recorder frames 

(DVR), a TiVO device, etc., as well as portions or combi- 40 2 . The method according to claim 1, wherein the display 

nations of these and other devices. The system includes one Qrder ^ an Qrder the video frames are in when bcing 

or more video sources 64, one or more input/output devices displayed 

74, a processor 66 and a memory 68. 3 ^ metnod according to claim 1, wherein the display 

The video/image source(s) 64 may represent, e.g., a order 0 f me enhancement frames includes an enhancement 

television receiver, a VCR or other video/image storage 45 frame corresponding to a B-frame being placed between an 

device. The source(s) 74 may alternatively represent one or enhancement frame corresponding to an I-frame and an 

more network connections for receiving video from a server enhancement frame corresponding to at least one P-frame. 

or servers over, e.g., a global computer communications 4 yjje method according to claim 1, wherein the display 

network such as the Internet, a wide area network, a met- order mc i ude s an enhancement frame corresponding to a 

ropolitan area network, a local area network, a terrestrial 50 B-frame being placed between enhancement frames corre- 

broadcast system, a cable network, a satellite network, a sponding to P-frames. 

wireless network, or a telephone network, as well as portions 5 -r^ met hod according to claim 1, wherein the enhance- 

or combinations of these and other types of networks. ment layer f rames are delayed with respect to the base layer. 

The input/output devices 74, processor 66 and memory 68 6. A memory medium including a code for decoding a 

communicate over a communication medium 72. The com- 55 video signal including a base layer and an enhancement 

munication medium 72 may represent, e.g., a bus, a com- i aye r, wherein the enhancement layer includes enhancement 

munication network, one or more internal connections of a frames arranged in a display order, the code comprising: 

circuit, circuit card or other device, as well as portions and a code l0 deco d e the base layer to produce decoded base 

combinations of these and other communication media. i ayer frames; 

Input video data from the source(s) 64 is processed in 60 a co de to decode the enhancement layer to produce 

accordance with one or more software programs stored in decoded enhancement layer frames; 

memory 66 and executed by processor 66 in order to a code to combine the decoded base layer frames with the 

generate output video/images supplied to a display device decoded enhancement layer frames without storing any 

70. of the decoded enhancement layer frames to form video 

In a preferred embodiment, the coding, decoding and 65 frames, 

rearranging of the enhancement layer frames described in 7. An apparatus for decoding a video signal including a 

conjunction with FIGS. 3-8 is implemented by computer base layer and an enhancement layer, wherein the enhance- 
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ment layer includes enhancement frames arranged in a 
display order, the code comprising: 

a first decoder to decode the base layer to produce 

decoded base layer frames; 
a second decoder to decode the enhancement layer to 
produce decoded enhancement layer frames; 



8 



an adder to combine the decoded base layer frames with 
the decoded enhancement layer frames without storing 
any of the decoded enhancement layer frames to form 
video frames. 
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